Training Agents to Perform Sequential Behavior

نویسندگان

  • Marco Colombetti
  • Marco Dorigo
چکیده

This paper is concerned with training an agent to perform sequential behavior. In previous work we have been applying reinforcement learning techniques to control a reactive robot. Obviously, a pure reactive system is limited in the kind of interactions it can learn. In particular, it can only learn what we call pseudo-sequences, that is sequences of actions in which the transition signal is generated by the appearance of a sensorial stimulus. We discuss the difference between pseudo-sequences and proper sequences, and the implication that these differences have on training procedures. A result of our research is that, in case of proper sequences, for learning to be successful the agent must have some kind of memory; moreover it is often necessary to let the trainer and the learner communicate. We study therefore the influence of communication on the learning process. First we consider trainer-to-learner communication introducing the concept of reinforcement sensor, which let the learning robot explicitly know whether the last reinforcement was a reward or a punishment; we also show how the use of this sensor induces the creation of a set of error recovery rules. Then we introduce learner-to-trainer communication, which is used to disambiguate indeterminate training situations, that is situations in which observation alone of the learner behavior does not provide the trainer with enough information to decide if the learner is performing a right or a wrong move. All the design choices we make are discussed and compared by means of experiments in a simulated world. _______________________________ * This work has been partly supported by the Italian National Research Council, under the "Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo", subproject 2 "Processori dedicati", and under the "Progetto Finalizzato Robotica", subproject 2 "Tema: ALPI". + Progetto di Intelligenza Artificiale e Robotica, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci, 32, 20133 Milano, Italy (e-mail: [email protected]). # International Computer Science Institute, Berkeley, CA 94704, and Progetto di Intelligenza Artificiale e Robotica, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci, 32, 20133 Milano, Italy (e-mail: [email protected]).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Training a Tetris agent via interactive shaping: a demonstration of the TAMER framework

As computational learning agents continue to improve their ability to learn sequential decision-making tasks, a central but largely unfulfilled goal is to deploy these agents in real-world domains in which they interact with humans and make decisions that affect our lives. People will want such interactive agents to be able to perform tasks for which the agent’s original developers could not pr...

متن کامل

Multiagent Supervised Training with Agent Hierarchies and Manual Behavior Decomposition

We present a supervised learning from demonstration system capable of training stateful and recurrent behaviors, both in the single agent and multiagent case. Furthermore, behavior complexity due to statefulness and multiple agents can result in a high dimensional learning space, which can require many samples to learn properly. Our approach, which relies heavily on both per-agent behavior deco...

متن کامل

Are People Successful at Learning Sequential Decisions on a Perceptual Matching Task?

Sequential decision-making tasks are commonplace in our everyday lives. We report the results of an experiment in which human subjects were trained to perform a perceptual matching task, an instance of a sequential decision-making task. We use two benchmarks to evaluate the quality of subjects’ learning. One benchmark is based on optimal performance as defined by a dynamic programming procedure...

متن کامل

Interactive Learning for Sequential Decisions and Predictions

Sequential prediction problems arise commonly in many areas of robotics and information processing: e.g., predicting a sequence of actions over time to achieve a goal in a control task, interpreting an image through a sequence of local image patch classifications, or translating speech to text through an iterative decoding procedure. Learning predictors that can reliably perform such sequential...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Adaptive Behaviour

دوره 2  شماره 

صفحات  -

تاریخ انتشار 1994